7 research outputs found
Multiple Access Channels with Combined Cooperation and Partial Cribbing
In this paper we study the multiple access channel (MAC) with combined
cooperation and partial cribbing and characterize its capacity region.
Cooperation means that the two encoders send a message to one another via a
rate-limited link prior to transmission, while partial cribbing means that each
of the two encoders obtains a deterministic function of the other encoder's
output with or without delay. Prior work in this field dealt separately with
cooperation and partial cribbing. However, by combining these two methods we
can achieve significantly higher rates. Remarkably, the capacity region does
not require an additional auxiliary random variable (RV) since the purpose of
both cooperation and partial cribbing is to generate a common message between
the encoders. In the proof we combine methods of block Markov coding, backward
decoding, double rate-splitting, and joint typicality decoding. Furthermore, we
present the Gaussian MAC with combined one-sided cooperation and quantized
cribbing. For this model, we give an achievability scheme that shows how many
cooperation or quantization bits are required in order to achieve a Gaussian
MAC with full cooperation/cribbing capacity region. After establishing our main
results, we consider two cases where only one auxiliary RV is needed. The first
is a rate distortion dual setting for the MAC with a common message, a private
message and combined cooperation and cribbing. The second is a state-dependent
MAC with cooperation, where the state is known at a partially cribbing encoder
and at the decoder. However, there are cases where more than one auxiliary RV
is needed, e.g., when the cooperation and cribbing are not used for the same
purposes. We present a MAC with an action-dependent state, where the action is
based on the cooperation but not on the cribbing. Therefore, in this case more
than one auxiliary RV is needed
Towards Optimal Compression: Joint Pruning and Quantization
Compression of deep neural networks has become a necessary stage for
optimizing model inference on resource-constrained hardware. This paper
presents FITCompress, a method for unifying layer-wise mixed precision
quantization and pruning under a single heuristic, as an alternative to neural
architecture search and Bayesian-based techniques. FITCompress combines the
Fisher Information Metric, and path planning through compression space, to pick
optimal configurations given size and operation constraints with single-shot
fine-tuning. Experiments on ImageNet validate the method and show that our
approach yields a better trade-off between accuracy and efficiency when
compared to the baselines. Besides computer vision benchmarks, we experiment
with the BERT model on a language understanding task, paving the way towards
its optimal compression
FBM: Fast-Bit Allocation for Mixed-Precision Quantization
Quantized neural networks are well known for reducing latency, power
consumption, and model size without significant degradation in accuracy, making
them highly applicable for systems with limited resources and low power
requirements.
Mixed precision quantization offers better utilization of customized hardware
that supports arithmetic operations at different bitwidths. Existing
mixed-precision schemes rely on having a high exploration space, resulting in a
large carbon footprint. In addition, these bit allocation strategies mostly
induce constraints on the model size rather than utilizing the performance of
neural network deployment on specific hardware. Our work proposes Fast-Bit
Allocation for Mixed-Precision Quantization (FBM), which finds an optimal
bitwidth allocation by measuring desired behaviors through a simulation of a
specific device, or even on a physical one.
While dynamic transitions of bit allocation in mixed precision quantization
with ultra-low bitwidth are known to suffer from performance degradation, we
present a fast recovery solution from such transitions.
A comprehensive evaluation of the proposed method on CIFAR-10 and ImageNet
demonstrates our method's superiority over current state-of-the-art schemes in
terms of the trade-off between neural network accuracy and hardware efficiency.
Our source code, experimental settings and quantized models are available at
https://github.com/RamorayDrake/FBM
Jet Single Shot Detection
We apply object detection techniques based on Convolutional Neural Networks to jet reconstruction and identification at the CERN Large Hadron Collider. In particular, we focus on CaloJet reconstruction, representing each event as an image composed of calorimeter cells and using a Single Shot Detection network, called Jet-SSD. The model performs simultaneous localization and classification and additional regression tasks to measure jet features. We investigate Ternary Weight Networks with weights constrained to {-1, 0, 1} times a layer- and channel-dependent scaling factors. We show that the quantized version of the network closely matches the performance of its full-precision equivalent
Lightweight jet reconstruction and identification as an object detection task
We apply object detection techniques based on deep convolutional blocks to end-to-end jet identification and reconstruction tasks encountered at the CERN large hadron collider (LHC). Collision events produced at the LHC and represented as an image composed of calorimeter and tracker cells are given as an input to a Single Shot Detection network. The algorithm, named PFJet-SSD performs simultaneous localization, classification and regression tasks to cluster jets and reconstruct their features. This all-in-one single feed-forward pass gives advantages in terms of execution time and an improved accuracy w.r.t. traditional rule-based methods. A further gain is obtained from network slimming, homogeneous quantization, and optimized runtime for meeting memory and latency constraints of a typical real-time processing environment. We experiment with 8-bit and ternary quantization, benchmarking their accuracy and inference latency against a single-precision floating-point. We show that the ternary network closely matches the performance of its full-precision equivalent and outperforms the state-of-the-art rule-based algorithm. Finally, we report the inference latency on different hardware platforms and discuss future applications.ISSN:2632-215